[CK_TILE][FMHA] Add new tile size for async by LJ-underdog · Pull Request #3586 · ROCm/composable_kernel

LJ-underdog · 2026-01-16T03:31:54Z

Proposed changes

This PR adds a new tile size configuration for async pipeline operations in the FMHA (Fused Multi-Head Attention) forward pass implementation. The changes introduce support for a 64x128 tile configuration for specific head dimension scenarios and adjust the sequence tuning logic to accommodate this new tile size.
Changes:

Modified sequence tuning logic to include tile size 64 as a special case alongside the maximum tile size
Added filtering logic to exclude 64-size tiles for non-async pipelines with 128x128 head dimensions
Introduced a new 64x128x32 tile size configuration with compute unit constraint for 128x128 head dimensions

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

LJ-underdog · 2026-01-16T03:33:41Z

Performance` improve 30% on gfx950 when block_num <= cu_num

`./bin/tile_example_fmha_fwd -h=8 -d=128 -s=512 -kname=1 -v=1
[fp16|batch|bhsd] b:2, h:8/8, s:512/512, d:128/128, scale_s:0.0883883, bias:n, p_drop:0, lse:0, qscale:n, mask:n, v:r, fmha_fwd_d128_fp16_batch_b128x64x32x128x16x128_r4x1x1_r4x1x1_w32x32x16_w32x32x16_qr_async_trload_vr_npad_nlogits_nbias_nmask_nlse_ndropout_nskip_nqscale_trload_nsink, 0.020 ms, 109.11 TFlops, 426.21 GB/s, valid:y

./bin/tile_example_fmha_fwd -h=8 -d=128 -s=512 -kname=1 -v=1
[fp16|batch|bhsd] b:2, h:8/8, s:512/512, d:128/128, scale_s:0.0883883, bias:n, p_drop:0, lse:0, qscale:n, mask:n, v:r, fmha_fwd_d128_fp16_batch_b64x128x32x128x32x128_r4x1x1_r4x1x1_w16x16x32_w16x16x16_qr_async_vr_psddv_nlogits_nbias_nmask_nlse_ndropout_nskip_nqscale_ntrload_nsink, 0.015 ms, 143.74 TFlops, 561.49 GB/s, valid:y`

Copilot

Pull request overview

This PR adds a new tile size configuration for async pipeline operations in the FMHA (Fused Multi-Head Attention) forward pass implementation. The changes introduce support for a 64x128 tile configuration for specific head dimension scenarios and adjust the sequence tuning logic to accommodate this new tile size.

Changes:

Modified sequence tuning logic to include tile size 64 as a special case alongside the maximum tile size
Added filtering logic to exclude 64-size tiles for non-async pipelines with 128x128 head dimensions
Introduced a new 64x128x32 tile size configuration with compute unit constraint for 128x128 head dimensions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

This reverts commit f3aafb9.

* add new tile size for async Signed-off-by: Linjun-AMD <Jun.Lin@amd.com> * Update example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix lse error Signed-off-by: Linjun-AMD <Jun.Lin@amd.com> --------- Signed-off-by: Linjun-AMD <Jun.Lin@amd.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

This reverts commit f3aafb9.

…3613)" This reverts commit 8f75869.

* Revert "Revert "[CK_TILE][FMHA] Add new tile size for async (#3586)" (#3613)" This reverts commit 8f75869. * Add new tile_size for async pipeline Signed-off-by: Linjun-AMD <Jun.Lin@amd.com> * Update include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_qr_ks_vs_async.hpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Signed-off-by: Linjun-AMD <Jun.Lin@amd.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

add new tile size for async

d050e38

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

LJ-underdog requested review from Snektron, ThomasNing, afagaj, andriy-ca, aosewski, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent, vidyasagar-amd and vpietila-amd as code owners January 16, 2026 03:31

LJ-underdog requested a review from Copilot January 16, 2026 03:34

Copilot AI reviewed Jan 16, 2026

View reviewed changes

example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py Outdated Show resolved Hide resolved

LJ-underdog and others added 2 commits January 16, 2026 11:36

Update example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py

4b6dd36

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

fix lse error

8dae639

Signed-off-by: Linjun-AMD <Jun.Lin@amd.com>

illsilin approved these changes Jan 19, 2026

View reviewed changes

illsilin merged commit f3aafb9 into develop Jan 19, 2026
26 checks passed

illsilin deleted the lj/add_tile_size branch January 19, 2026 23:22

LJ-underdog added a commit that referenced this pull request Jan 20, 2026

Revert "[CK_TILE][FMHA] Add new tile size for async (#3586)"

849d7da

This reverts commit f3aafb9.

illsilin pushed a commit that referenced this pull request Jan 20, 2026

Revert "[CK_TILE][FMHA] Add new tile size for async (#3586)" (#3613)

8f75869

This reverts commit f3aafb9.

LJ-underdog added a commit that referenced this pull request Jan 21, 2026

Revert "Revert "[CK_TILE][FMHA] Add new tile size for async (#3586)" (#…

bbaecb6

…3613)" This reverts commit 8f75869.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CK_TILE][FMHA] Add new tile size for async#3586

[CK_TILE][FMHA] Add new tile size for async#3586
illsilin merged 3 commits intodevelopfrom
lj/add_tile_size

LJ-underdog commented Jan 16, 2026 •

edited

Loading

Uh oh!

LJ-underdog commented Jan 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LJ-underdog commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Discussion

Uh oh!

LJ-underdog commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LJ-underdog commented Jan 16, 2026 •

edited

Loading

LJ-underdog commented Jan 16, 2026 •

edited

Loading